{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IqM-T1RTzY6C"
      },
      "source": [
        "# Unsloth Fine-tuning DeepSeek R1 Distilled Llama 8B\n",
        "\n",
        "In this notebook, it will demonstrate how to finetune `DeepSeek-R1-Distill-Llama-8B` with Unsloth, using a medical dataset."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PqXgNATbyHlU"
      },
      "source": [
        "## Why do we need LLM fine-tuning?\n",
        "\n",
        "Fine-tuning tailors the model to have a better performance for specific tasks, making it more effective and versatile in real-world applications. This process is essential for tailoring an existing model to a particular task or domain."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "2eSvM9zX_2d3"
      },
      "outputs": [],
      "source": [
        "%%capture\n",
        "!pip install unsloth\n",
        "# Also get the latest nightly Unsloth!\n",
        "!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git\n",
        "!pip install bitsandbytes unsloth_zoo"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "r2v_X2fA0Df5"
      },
      "source": [
        "\n",
        "## Choose a Base Model\n",
        "\n",
        "1. Choose a model that aligns with your usecase\n",
        "2. Assess your storage, compute capacity and dataset\n",
        "3. Select a Model and Parameters\n",
        "4. Choose Between Base and Instruct Models"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "6hcLGeGQsHjP"
      },
      "outputs": [],
      "source": [
        "pip install triton==3.2.0"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "QmUBVEnvCDJv"
      },
      "outputs": [],
      "source": [
        "from unsloth import FastLanguageModel\n",
        "import torch\n",
        "max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!\n",
        "dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+\n",
        "load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.\n",
        "\n",
        "model, tokenizer = FastLanguageModel.from_pretrained(\n",
        "    model_name = \"unsloth/DeepSeek-R1-Distill-Llama-8B\",\n",
        "    max_seq_length = max_seq_length,\n",
        "    dtype = dtype,\n",
        "    load_in_4bit = load_in_4bit,\n",
        "    # token = \"hf_...\", # use one if using gated models like meta-llama/Llama-2-7b-hf\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SXd9bTZd1aaL"
      },
      "source": [
        "## Inference before fine-tuning"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "lvKN1w6sfW0O"
      },
      "outputs": [],
      "source": [
        "prompt_style = \"\"\"Below is an instruction that describes a task, paired with an input that provides further context.\n",
        "Write a response that appropriately completes the request.\n",
        "Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n",
        "\n",
        "### Instruction:\n",
        "You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.\n",
        "Please answer the following medical question.\n",
        "\n",
        "### Question:\n",
        "{}\n",
        "\n",
        "### Response:\n",
        "<think>{}\"\"\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "2AV54NuTfNv_"
      },
      "outputs": [],
      "source": [
        "question = \"一个患有急性阑尾炎的病人已经发病5天，腹痛稍有减轻但仍然发热，在体检时发现右下腹有压痛的包块，此时应如何处理？\"\n",
        "\n",
        "\n",
        "FastLanguageModel.for_inference(model)\n",
        "inputs = tokenizer([prompt_style.format(question, \"\")], return_tensors=\"pt\").to(\"cuda\")\n",
        "\n",
        "outputs = model.generate(\n",
        "    input_ids=inputs.input_ids,\n",
        "    attention_mask=inputs.attention_mask,\n",
        "    max_new_tokens=1200,\n",
        "    use_cache=True,\n",
        ")\n",
        "response = tokenizer.batch_decode(outputs)\n",
        "print(response[0].split(\"### Response:\")[1])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vITh0KVJ10qX"
      },
      "source": [
        "## Prepare Dataset\n",
        "\n",
        "A medical dataset [https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT/](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT/) will be used to train the selected model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "4dcpgiM9d21z"
      },
      "outputs": [],
      "source": [
        "train_prompt_style = \"\"\"Below is an instruction that describes a task, paired with an input that provides further context.\n",
        "Write a response that appropriately completes the request.\n",
        "Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n",
        "\n",
        "### Instruction:\n",
        "You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.\n",
        "Please answer the following medical question.\n",
        "\n",
        "### Question:\n",
        "{}\n",
        "\n",
        "### Response:\n",
        "<think>\n",
        "{}\n",
        "</think>\n",
        "{}\"\"\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xYTHheORxHTk"
      },
      "source": [
        "### Important Notice\n",
        "\n",
        "It's crucial to add the EOS (End of Sequence) token at the end of each training dataset entry, otherwise you may encounter infinite generations."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "WasZ83hyd5F8"
      },
      "outputs": [],
      "source": [
        "EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN\n",
        "\n",
        "\n",
        "def formatting_prompts_func(examples):\n",
        "    inputs = examples[\"Question\"]\n",
        "    cots = examples[\"Complex_CoT\"]\n",
        "    outputs = examples[\"Response\"]\n",
        "    texts = []\n",
        "    for input, cot, output in zip(inputs, cots, outputs):\n",
        "        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN\n",
        "        texts.append(text)\n",
        "    return {\n",
        "        \"text\": texts,\n",
        "    }"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "HvOPfPnet76H"
      },
      "outputs": [],
      "source": [
        "from datasets import load_dataset\n",
        "dataset = load_dataset(\"FreedomIntelligence/medical-o1-reasoning-SFT\", 'zh', split = \"train[0:100]\", trust_remote_code=True)\n",
        "print(dataset.column_names)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xg4_dG-m0Cz4"
      },
      "source": [
        "For `Ollama` and `llama.cpp` to function like a custom `ChatGPT` Chatbot, we must only have 2 columns - an `instruction` and an `output` column. We need to transform the dataset into proper structure."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "F33v7dB0d8js"
      },
      "outputs": [],
      "source": [
        "dataset = dataset.map(formatting_prompts_func, batched = True)\n",
        "dataset[\"text\"][0]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "idAEIeSQ3xdS"
      },
      "source": [
        "## Train the model\n",
        "Now let's use Huggingface TRL's `SFTTrainer`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "WjQ3ButQrLJm"
      },
      "outputs": [],
      "source": [
        "model = FastLanguageModel.get_peft_model(\n",
        "    model,\n",
        "    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128\n",
        "    target_modules = [\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n",
        "                      \"gate_proj\", \"up_proj\", \"down_proj\",],\n",
        "    lora_alpha = 16,\n",
        "    lora_dropout = 0, # Supports any, but = 0 is optimized\n",
        "    bias = \"none\",    # Supports any, but = \"none\" is optimized\n",
        "    # [NEW] \"unsloth\" uses 30% less VRAM, fits 2x larger batch sizes!\n",
        "    use_gradient_checkpointing = \"unsloth\", # True or \"unsloth\" for very long context\n",
        "    random_state = 3407,\n",
        "    use_rslora = False,  # We support rank stabilized LoRA\n",
        "    loftq_config = None, # And LoftQ\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "95_Nn-89DhsL"
      },
      "outputs": [],
      "source": [
        "from trl import SFTTrainer\n",
        "from transformers import TrainingArguments\n",
        "from unsloth import is_bfloat16_supported\n",
        "trainer = SFTTrainer(\n",
        "    model = model,\n",
        "    tokenizer = tokenizer,\n",
        "    train_dataset = dataset,\n",
        "    dataset_text_field = \"text\",\n",
        "    max_seq_length = max_seq_length,\n",
        "    dataset_num_proc = 2,\n",
        "    packing = False, # Can make training 5x faster for short sequences.\n",
        "    args = TrainingArguments(\n",
        "        per_device_train_batch_size = 2,\n",
        "        gradient_accumulation_steps = 4,\n",
        "        warmup_steps = 5,\n",
        "        max_steps = 60,\n",
        "        # num_train_epochs = 1, # For longer training runs!\n",
        "        learning_rate = 2e-4,\n",
        "        fp16 = not is_bfloat16_supported(),\n",
        "        bf16 = is_bfloat16_supported(),\n",
        "        logging_steps = 1,\n",
        "        optim = \"adamw_8bit\",\n",
        "        weight_decay = 0.01,\n",
        "        lr_scheduler_type = \"linear\",\n",
        "        seed = 3407,\n",
        "        output_dir = \"outputs\",\n",
        "        report_to = \"none\", # Use this for WandB etc\n",
        "    ),\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "yqxqAZ7KJ4oL"
      },
      "outputs": [],
      "source": [
        "trainer_stats = trainer.train()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ekOmTR1hSNcr"
      },
      "source": [
        "## Inference after fine-tuning\n",
        "\n",
        "Let's inference with same question again and see the difference."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "05jKLSaYvCaZ"
      },
      "outputs": [],
      "source": [
        "print(question)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "background_save": true
        },
        "id": "kR3gIAX-SM2q"
      },
      "outputs": [],
      "source": [
        "FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!\n",
        "inputs = tokenizer([prompt_style.format(question, \"\")], return_tensors=\"pt\").to(\"cuda\")\n",
        "\n",
        "outputs = model.generate(\n",
        "    input_ids=inputs.input_ids,\n",
        "    attention_mask=inputs.attention_mask,\n",
        "    max_new_tokens=1200,\n",
        "    use_cache=True,\n",
        ")\n",
        "response = tokenizer.batch_decode(outputs)\n",
        "print(response[0].split(\"### Response:\")[1])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5GBJX1V3wyCK"
      },
      "source": [
        "## Upload Model to HuggingFace\n",
        "\n",
        "Now, let's save our finetuned model and upload it to HuggingFace."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "a-RykxXixLmj"
      },
      "source": [
        "### Save the fine-tuned model to GGUF format\n",
        "\n",
        "Choose the llama.cpp's GGUF format we prefer by setting the corresponding `if` to `True`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "o-on6ebhBZMg"
      },
      "outputs": [],
      "source": [
        "from google.colab import userdata\n",
        "\n",
        "HUGGINGFACE_TOKEN = userdata.get('HUGGINGFACE_TOKEN')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "id": "WisaI4QOw07u"
      },
      "outputs": [],
      "source": [
        "# Save to 8bit Q8_0\n",
        "if True: model.save_pretrained_gguf(\"model\", tokenizer,)\n",
        "\n",
        "# Save to 16bit GGUF\n",
        "if False: model.save_pretrained_gguf(\"model_f16\", tokenizer, quantization_method = \"f16\")\n",
        "\n",
        "# Save to q4_k_m GGUF\n",
        "if False: model.save_pretrained_gguf(\"model\", tokenizer, quantization_method = \"q4_k_m\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tcuthG09JF3G"
      },
      "source": [
        "### Push the model to HuggingFace"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "g4AwRcgzO6PF"
      },
      "source": [
        "Create a model type repository for your model if you haven't done so."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "RGkksjx2MO9S"
      },
      "outputs": [],
      "source": [
        "from huggingface_hub import create_repo\n",
        "create_repo(\"Siriustl666/medical-model\", token=HUGGINGFACE_TOKEN, exist_ok=True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "tCyft7eVBk2b"
      },
      "outputs": [],
      "source": [
        "model.push_to_hub_gguf(\"Siriustl666/medical-model\", tokenizer, token = HUGGINGFACE_TOKEN)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BK_zr8PHy3Lj"
      },
      "source": [
        "### Ollama run HuggingFace model\n",
        "\n",
        "```bash\n",
        "ollama run hf.co/{username}/{repository}:{quantization}\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yl5Qoz2HzSUq"
      },
      "source": [
        "### Ollama inference\n",
        "\n",
        "```bash\n",
        "curl http://localhost:11434/api/chat -d '{ \\\n",
        "  \"model\": \"\", \\\n",
        "  \"messages\": [ \\\n",
        "    { \"role\": \"user\", \"content\": \"一个患有急性阑尾炎的病人已经发病5天，腹痛稍有减轻但仍然发热，在体检时发现右下腹有压痛的包块，此时应如何处理？\" } \\\n",
        "  }'\n",
        "```"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "L4",
      "machine_shape": "hm",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}